An EPIC Processor with Pending Functional Units

نویسندگان

  • Lori Carter
  • Weihaw Chuang
  • Brad Calder
چکیده

The Itanium processor, an implementation of an Explicitly Parallel Instruction Computing (EPIC) architecture, is an in-order processor that fetches, executes, and forwards results to functional units inorder. The architecture relies heavily on the compiler to expose Instruction Level Parallelism (ILP) to avoid stalls created by in-order processing. The goal of this paper is to examine, in small steps, changing the in-order Itanium processor model to allow execution to be performed out-of-order. The purpose is to overcome memory and functional unit latencies. To accomplish this, we consider an architecture with Pending Functional Units (PFU). The PFU architecture assigns/schedules instructions to functional units in-order. Instructions sit at the pending functional units until their operands become ready and then execute out-of-order. While an instruction is pending at a functional unit, no other instruction can be scheduled to that functional unit. We examine several PFU architecture designs. The minimal design does not perform renaming, and only supports bypassing of non-speculative result values. We then examine making PFU more aggressive by supporting speculative register state, and then finally by adding in register renaming. We show that the minimal PFU architecture provides on average an 18% speedup over an in-order EPIC processor and produces up to half of the speedup that would be gained using a full out-of-order architecture.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine-Description Driven Compilers for EPIC and VLIW Processors

In the past, due to the restricted gate count available on an inexpensive chip, embedded DSPs have had limited parallelism, few registers and irregular, incomplete interconnectivity. More recently, with increasing levels of integration, embedded VLIW processors have started to appear. Such processors typically have higher levels of instruction-level parallelism, more registers, and a relatively...

متن کامل

A Novel Multiply-Accumulator Unit Bus Encoding Architecture for Image Processing Applications

In the CMOS circuit power dissipation is a major concern for VLSI functional units. With shrinking feature size, increased frequency and power dissipation on the data bus have become the most important factor compared to other parts of the functional units. One of the most important functional units in any processor is the Multiply-Accumulator unit (MAC). The current work focuses on the develop...

متن کامل

Simple ASIC Complex ASIC RaPiD FPGA GARP DPGA SuperSpeculative RAW TRACE ( Multiscalar ) SMT VECTOR

Poor scalability of Superscalar architectures with increasing instruction-level parallelism (ilp) has resulted in a trend towards statically scheduled horizontal architectures such as Very Large Instruction Word (vliw) processors and their more sophisticated successors called Explicitly Parallel Instruction Computing (epic) architectures. We extend the epic model with additional capabilities to...

متن کامل

Weld for Itanium Processor

Sharma, Saurabh Weld for Itanium Processor (Under the direction of Dr. Thomas M. Conte) This dissertation extends a WELD for Itanium processors. Emre Özer presented WELD architecture in his Ph.D. thesis. WELD integrates multithreading support into an Itanium processor to hide run-time latency effects that cannot be determined by the compiler. Also, it proposes a hardware technique called operat...

متن کامل

The Itanium Processor Employs the Epic Design Style to Exploit Instruction-level Parallelism. Its Hardware and Software Work in Concert to Deliver Higher Performance through a Simpler, More Efficient

The Itanium processor is the first implementation of the IA-64 instruction set architecture (ISA). The design team optimized the processor to meet a wide range of requirements: high performance on Internet servers and workstations, support for 64-bit addressing, reliability for mission-critical applications, full IA-32 instruction set compatibility in hardware, and scalability across a range of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002